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Hybrid Block Diagonalization for Massive 
Multiuser MIMO Systems 

Weiheng Ni and Xiaodai Dong 


Abstract —For a massive multiple-input multiple-output 
(MIMO) system, restricting the number of RF chains to far less 
than the number of antenna elements can significantly reduce 
the implementation cost compared to the full complexity RF 
chain configuration. In this paper, we consider the downlink 
communication of a massive multiuser MIMO (MU-MIMO) sys¬ 
tem and propose a low-complexity hybrid block diagonalization 
(Hy-BD) scheme to approach the capacity performance of the 
traditional BD processing method. We aim to harvest the large 
array gain through the phase-only RF precoding and combining 
and then digital BD processing is performed on the eqnlvalent 
baseband channel. The proposed Hy-BD scheme is examined 
in both the large Rayleigh fading channels and millimeter 
wave (mmWave) channels. A performance analysis is further 
conducted for single-path channels and large number of transmit 
and receive antennas. Finally, simulation results demonstrate 
that our Hy-BD scheme, with a lower implementation and 
computational complexity, achieves a capacity performance that 
is close to (sometimes even higher than) that of the traditional 
high-dimensional BD processing. 

Index Terms —Massive MIMO, large scale MU-MIMO, hybrid 
processing, block diagonalization, limited RF chains, mm Wave 

I. Introduction 

To realize the tremendous capacity target of the next 
generation mobile cellular systems, one promising option is 
scaling up to massive multiple-input multiple-output (MIMO) 
systems In the massive multiuser MIMO (MU-MIMO) 

systems, some simple linear pre/post-processing (transmit pre¬ 
coding/receive combining) schemes, such as zero-forcing (ZF) 
and linear minimum mean-square error (MMSE), are able to 
approach the optimal capacity performance achieved by the 
dirty paper coding (DPC) as the number of antennas goes to 
infinity ||5l. Moreover, the ZF processing that cancels the inter¬ 
user interference through channel inversion can be generalized 
as block diagonalization (BD) when the base stations (BSs) 
and mobile stations (MSs) are both equipped with multiple 
antennas El. For the downlink spatial multiplexing in MU- 
MIMO systems, the BD method achieves sub-optimal capac¬ 
ity performance; however, it reduces the complexity of the 
transmitter and receiver structures by providing closed-form 
precoder and combiner solutions. From a different perspective, 
the problems of the downlink beamformer design for signal-to- 
interference-plus-noise ratio balancing and the downlink phys¬ 
ical layer multicasting that aims at minimizing the transmit 
power in massive MIMO systems have been investigated in Cl 
and m respectively. Reference ||9l presents a low-complexity 
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algorithm for detection in massive MIMO systems based on 
the likelihood ascent search (LAS) algorithm. 

In large-scale MIMO systems, the large array gain is ren¬ 
dered by a massive number of antennas at the order of a hun¬ 
dred or more El. Conventional pre-processing is performed 
through modifying the amplitudes and phases of the complex 
transmit symbols at the baseband and then upconverted to 
the passband by going through radio frequency (RF) chains 
(including the digital-to-analog conversion, signal mixing and 
power amplifying), which requires that the number of the RF 
chains is in the range of hundreds, equal to the number of 
the antenna elements. Post-processing is similar involving a 
large number of analog receive RF chains and digital baseband 
operations. This leads to unacceptably high implementation 
cost and energy consumption. 

Recently, enabled by the cost-effective variable phase 
shifters, a limited number of RF chains have been applied 
in the MIMO systems ifTOl - lfTTl . The analog RF processing 
provides the high-dimensional phase-only control while the 
digital baseband processing can be performed in a very low 
dimension, termed as hybrid processing. Under the limited 
RF chains constraint, references Cl and mil investigate 
the hybrid processing schemes in the point-to-point (P2P) 
MIMO systems. A single-stream communication under the 
Rayleigh fading MIMO channels achieves the full diver¬ 
sity order through the equal gain transmission/combining 
(EGT/EGC) in d, while the multiple-stream transmission 
under MIMO channels is proposed in ifTTI . In addition, flZll 
and d implement the hybrid processing to the downlink of 
the massive MU-MIMO systems with single-antenna users. 
In ca, the near-optimal capacity performance, compared 
to the full-complexity systems, is achieved through the ZE 
baseband precoding combined with the EGT processing in 
the RE domain. Note that this technique also works for the 
millimeter wave (mmWave) channel. In CUj the phase-only 
RE precoding are employed to maximize the minimum average 
data rate of users via a bi-convex approximation approach. 

Eurthermore, in mmWave communications systems, it is 
possible to build a large antenna array in a compact region and 
apply hybrid processing technique lfT4ll - lfT8ll . The “dominant” 
paths in P2P mmWave channels are captured through the 
hybrid processing in Ql and d, where the former considers 
the single-stream transmission while the latter enables the 
multiple-stream communication. d presents a hybrid pro¬ 
cessing by decomposing the optimal precoding/combining ma¬ 
trix via orthogonal matching pursuit with the transmit/receive 
array response vectors as the basis vectors. Reference Cl 
can be regarded as a special one-RE-chain case of reference 
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ini. On the other hand, in the mm Wave MU-MIMO systems, 
ED considers the single-antenna users and designs the analog 
RF precoding based on the transmit beam directions, while 
the digital processing (matched filter, zero-forcing or Wiener 
filter) performs on the baseband equivalent channels. With the 
multiple-antenna users, some baseband processing schemes 
such MMSE and BD are examined in El, which, however, 
neglects the design of the analog RF processing. In addition, 
a comprehensive limited feedback hybrid precoding scheme is 
proposed to configure hybrid precoders at the transmitter and 
analog combiners with a small training and feedback overhead, 
which is also effective for multiple-antenna users who have 
only one RF chain ifTsl . 

In this paper, we consider the downlink communication 
of a massive MU-MIMO system where the BS and all MSs 
have multiple antennas. With a limited number (> 1) of RF 
chains in BS and MSs, hybrid processing is applied as an 
alternative to the traditional high-cost full dimensional RF and 
baseband processing. We propose to utilize the RF precoding 
and combining to harvest the large array gain provided by the 
large number of antennas in the massive MU-MIMO channels, 
which shares the similar objective with the above references 
that study the hybrid processing in the MU-MIMO systems. 
However, the analog RF processing design for the MU-MIMO 
systems with multiple-antenna MSs accommodating multiple 
data streams per MS is not available in the literature and the 
novel BS RF precoder design is based on a newly defined 
“aggregate intermediate channel”. More specifically, the RF 
combiners of all the MSs are obtained by selecting some 
of the discrete Fourier transform (DFT) bases, while the RF 
precoder of the BS is designed by extracting the phases of 
the conjugate transpose of the aggregate intermediate channel 
which incorporates the MS RF combiners and the original 
downlink channels. With the designed RF precoder and com¬ 
biners, a low-dimensional BD processing can be performed 
at the baseband to cancel the inter-user interference, and the 
whole operation is named the hybrid BD (Hy-BD) scheme. 
The advantages of such a Hy-BD scheme can be summarized 
as follows: 

1) Fow implementation cost and low computation complex¬ 
ity; 

2) Applicability to both Rayleigh fading and mmWave 
massive MU-MIMO channels. Channel state information 
(CSI) is required but not the information of each individ¬ 
ual propagation path; 

3) Reduction on the feedback overhead of the RF domain 
operations. 

Simulation results demonstrate that the proposed Hy-BD 
scheme achieves a capacity performance that is quite close to, 
sometimes even higher than, that of the full-complexity BD 
scheme in 0 with a lower implementation and computational 
cost. The Hy-BD scheme is also examined in the mm Wave 
MU-MIMO communication channels and compared to the 
spatially sparse precoding/combining method ca initially 
proposed for SU-MIMO but extended to MU-MIMO in this 
paper. 


II. System Model 

A. System Model 

We consider the downlink communication of a massive 
multiuser MIMO system shown in Fig. [T] where a base 
station with Nbs antennas and Mbs RF chains is assumed to 
schedule K mobile stations. Each MS is equipped with Nms 
antennas and Mms RF chains to support Ns data streams, 
which means total KNs data streams are handled by the BS. 
To guarantee the effectiveness of the communication carried by 
the limited number of RF chains, the number of the transmitted 
steams is constrained by KNs < Mbs < ^bs for the BS 
and Ns < Mms < Nms for each MS. 



Fig. 1: System diagram of a massive MU-MIMO system with 
hyrbid processing structure. 

At the BS, the transmitted symbols are assumed to be pro¬ 
cessed by a baseband precoder B of dimension Mbs x KNs 
and then by an RF precoder F of dimension Nbs x Mbs- 
Notably, the baseband precoder B enables both amplitude 
and phase modification, while only phase changes (phase-only 
control) can be realized by F since it is implemented by using 
analog phase shifters. Each entry of F is normalized to satisfy 
, where |F*^®’-')| denotes the amplitude of the 
(i, j)-th element of F. Furthermore, to meet the total transmit 
power constraint, B is normalized to satisfy ||FB|||, = KNs, 
where 11 • 11 the Frobenius norm. 

We assume a narrowband flat fading channel model and 
obtain the received signal of the /c-th MS 

yfc =HfeFBs + nfc, fc= (1) 

where s £ signal vector for a total of K MSs, 

each of which processes a Ns x 1 signal vector Sfe. Namely, 
s = where (-j^ denotes transpose. And 

the signal vector satisfies £[88-^] = where 

denotes conjugate transpose, E[-] denotes expectation, P is the 
average transmit power and Ik Ns is the KNs x KNs identity 
matrix. Hfe e CNmsxNbs is the channel matrix for the Ic-th 
MS, and rifc is the Nms x 1 vector of i.i.d. CJ\f{0, additive 
complex Gaussian noise. And the processed received signal at 
the k-th MS after combining is given by 

Yfc = Mf Wf HfcFBsH-Mf Wf rifc, k = 1,2, ■ ■ ■ , K, (2) 

where is the Nms x Mms RF combining matrix and 
is the Mms x Ns baseband combining matrix for the /c-th MS. 
Since is also implemented by the analog phase shifters, all 
elements of should have the constant amplitude such that 
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for each MS as 


We define an equivalent baseband channel 


Hfc = WfHfeF, fc = l,2,... ,A- (3) 


and the entire equivalent multiuser baseband channel can be 
denoted as 
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(4) 

Then the processed received signal at the fc-th MS can also be 

represented as 
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yfe = Mfc HfeBfeSfe-f 

MfHfcB, 
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interference 


+ MfWfnfc, fc = - ,A:, 

V-,-^ 

noise 

where is the ((fc — l)Ns + l)-th to the (fcA^s)-th columns 
of B, corresponding to the baseband precoding for s^. When 
the Gaussian symbols are used by the BS, the sum spectral 
efficiency achieved will be 


R = Y1 log2 

/f-1 

where R, = + 

(T^M^W^WfeMfe is the covariance matrix of both interfer¬ 
ence and noise. 

Generally, joint optimization on the RF and baseband 
precoders and combiners should be an essential method to 
design the processing scheme that achieves optimal sum 
spectral efficiency R. However, as stated in ca, finding global 
optima for similar constrained joint optimization problems 
(maxmizing R while constant-amplitude contraints imposed 
to the RF analog precoder and combiners) is often found to 
be intractable. Even in the traditional MU-MIMO systems 
without hybrid processing structure, it also needs enormous 
efforts to find a local optimum of sum rate by alternating 
optimization EH. For some recently designed hybrid pro¬ 
cessing schemes ifT^lfTTl - lfTSl in the literature, separated RF 
and baseband processing designs are investigated to obtain 
satisfying performance without involving a myriad of iterative 
procedures. Therefore, we choose to separate the RF and 
baseband domain designs in this paper. 


Ins + 
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B. Channel Model 

In this paper, the general channel matrix is set as H = 

where vFfe indicate the large scale path fading 

and normalized channel matrix respectively for the k-th MS, 
satisfying that E[||Hfe|||,] = NbsNms- With the knowledge 
of the general channel matrix, we aim to seek the BS hybrid 
precoders (F, B) and the hybrid combiners (W^, M^j’s for 


all K MSs through the Hy-BD scheme, which achieves a sub- 
optimal spectral efficiency for massive MU-MIMO systems 
by perfectly canceling the inter-user interference. Two kinds 
of channel models are considered in this paper: 

1) large i.i.d. Rayleigh fading channel Hrf, 

2) limited scattering mm Wave channel ii,nmw 

In the large Rayleigh fading channel, which is commonly 
considered in massive MU-MIMO systems, all entries of the 
normalized channel matrix for the fc-th MS follow i.i.d. 
CA/^(0,1). On the other hand, a large antenna array is often 
implemented in mmWave communications to combat the high 
free-space pathloss M-M- We adopt the clustered mm Wave 
channel model to characterize the limited scattering feature 
of the mmWave channel. The normalized mmWave downlink 
channel for the fc-th MS is assumed to be the sum of all 
propagation paths that are scattered in Nc clusters and each 
cluster contributes Np paths, which can be expressed as 


Na 


Hfc = 


INbsNms k^k 




i=l 1=1 


{otMBsmr. ( 7 ) 


where aj) is the complex gain of the i-th path in the Tth 
cluster, which follows CAf{0, 1). To reflect the sparsity of the 
mmWave channel, both of Nc and Np should not be too large. 
For the (i, /)-th path, and (j)ii are the azimuth angles of 
arrivakdeparture (AoA/AoD), while and 8LBs(4>ii) 

are the receive and transmit array response vectors at the 
azimuth angles of and respectively, and the elevation 
dimension is ignored. Within the cluster i, and have the 
uniformly-distributed mean values of 9^ and p’p respectively, 
while the lower and upper bounds of the uniform distribution 
for and can be defined as [9^-^, 9^^J and 
The angle spreads (standard deviations) of 9^^ and (p^i among 
all clusters are assumed to be constant, denoted as ag and 
Finally, the truncated Laplacian distribution is employed 
to generate all the AoDs/AoAs for this mmWave propagation 
channel matrix, base on the above parameters. 

The uniform linear array (ULA) is employed by the BS 
and MSs in our study, while the Hy-BD scheme in Section- 
Im] can directly be applied to arbitrary antenna arrays. For an 
N-element ULA, the array response vector can be given by 


^ula{0) = 


Vn - 


1 


j{N-l)^dsin{e) 


( 8 ) 

where A is the wavelength of the carrier, and d is the distance 
between neighboring antenna elements. The array response 
vectors of the BS and MSs can be written in the form of 
dill. Furthermore, other non-ULA antenna geometries, such 
as uniform planar array (UFA), are also examined in the 
simulations. 


III. Hybrid Block Diagonalization 

In the MU-MIMO systems, the generalized zero-forcing 
method (i.e., the traditional BD scheme) is infeasible to be 
practically implemented due to the high cost brought by the 
large number of RF chains as many as the antennas. By 
reducing the number of RF chains Mbs{Mms) to far less than 
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the antenna elements Nbs{Nms) at both the BS and MSs, we 
propose to utilize the RF precoding matrix F at the BS and 
the RF combining matrix Wfc at each MS to harvest the large 
array gain provided by the large number of antennas in the 
massive MU-MIMO channel. With the found F and all Wfc’s, 
the entire multiuser equivalent baseband channel Hgq can be 
determined based on (HI, which consists of all the equivalent 
channels for the MSs, namely Hfe, fc = 1,2, • • • ,K. Finally, 
a low-dimensional BD processing, involving the design of B 
and all M^’s, can be performed at the baseband. 


A. Array Gain Harvesting 


Owing to the large number of antennas in the massive MU- 
MIMO systems, the channel gains of the equivalent channel 
Heq can be scaled up through the appropriate phase-only 
control at the RF domain, which is called the large array gain. 
To be noted, each element in He, represents the equivalent 
channel gain from one RF chain at the BS to one RF chain 
at one MS. To achieve the high capacity with such a hybrid 
processing structure, the equivalent channel matrix He, are 
desired to have the following properties: 

1) Rank sufficiency: Hgq should be well-conditioned to 
support the multi-stream transmission, which means the 
rank of He, should be at least KNs', 

2) Large array gain: Hgq should sufficiently harvest the 
array gain so that it can provide as large gain for each 
stream transmission as possible. We propose to pursue 
the large array gain by enlarging the sum of the squares 
of the diagonal entries in Hgq. 

By definition. He, consists of the equivalent channels of all the 
MSs, namely Hfc = W^HfcF, k = 1,2,- ■ ■ ,K. We design 
the RF domain processing matrices W^’s and F and construct 
the equivalent channel He, by approximately satisfying the 
above two requirements, which will lead to a suboptimal 
performance under the hybrid precoding structure, but with 
significantly low complexity. 

Assume that all the RF combiners W^’s are given (the 
actual design of W^’s will be presented shortly). Define an 
aggregate intermediate channel given by 


H, 


WfHi 


WfH^ 


KMms "xNbs 


(9) 


and then the baseband equivalent channel is H^q = Hi„iF. 
Due to the phase shifting ability of the RF precoder and 
the knowledge of the channel matrix entries, we perform the 
phase-only RF precoding based on an equal gain transmission 
(EGT) method proposed in ifT^ to harvest the large array gain, 
by setting 


p(*j) 


s/Nbs 


( 10 ) 


where ipi^ is the phase of the (i, j)-th element of the conjugate 
transpose of Hint- This EGT precoding method requires 
Mbs = KMms RF chains at the BS, which means F is 
an Nbs X KMms matrix and should be a square matrix. 
The entries along the diagonal of the baseband equivalent 


channel Hgq denote the equivalent channel gains in terms 
of the RF chains, while the remaining entries indicate the 
inter-chain interference. We focus on the large array gain 
design through the RF precoding/combining and leave the 
interference canceling to the baseband processing in the Hy- 
BD scheme. 

Now let us return to the design of the RF combiners W^’s. 
Denote the m-th column of as . As the result of the 
EGT precoding method, the ((fc — 1)Mms + m)-th diagonal 
entry of He, is then given by ||(w^'"^)'^Hfe||i, where || • ||i 
denotes the 1-norm of a vector, corresponding to the m- 
th RF chain of the fc-th MS. Note that the entries in Hgq 
indicate the RF-chain to RF-chain channel gains and those 
off-diagonal entries indicate the inter-RF-chain, and even inter¬ 
user, interference. We aim to maximize the sum of the squares 
of diagonal entries of the baseband equivalent channel Heg, 
given by J2k=i Emff to pursue the large 

array gain. Due the independence of W^’s for all the MSs, 
maximizing I|? is equivalent to 

maximizing E'Sf Iliw^yHaUf f„, all t = 1,.. .if 
respectively. Hence, the design of the RF combiners can be 
obtained by solving 


max 

Wfc 


Mms 

ii(w 


m—l 


(m)x 

k ) 


H 


Hl 


2 

1 


s.t. 


1 

s/Nms ’ 


V*, j. 


( 11 ) 


Herein, we need to clarify that no inter-user interference is 
designed to be suppressed by solving the simplified maxi¬ 
mization problem in (fTTI) . which, as a heuristic method, does 
not guarantee the optimality of the sum-rate maximization, 
but lend tractability to approaching a sub-optimal solution. 
In this paper, instead of solving the non-convex problem 
(dUl directly, we modify the constraints to choose from a 


set of DFT basis, as explained in details next. Note that 


w 




\H-ci ||2 _ 




l)“ 


where de¬ 


notes the n-th column of H^. Moreover, the geometric MIMO 
channel models, including the Rayleigh fading and mmWave 


channels, can be represented in the form of (|7]i, which means 
is the linear combination of all the array response vectors 
of the AoAs. This fact implies that each addition term in 
||(w^™^)^Hfe||i, is the absolute value of the 

summing weighted projections of those array response vectors 
^Msi^ii) ^11 AoAs onto From this perspective, we 

first propose to set in the form of array response vector 
® to extract the gain from these projections, namely. 


d(w) = 




MS 


L 






( 12 ) 


where uj = ^dsind denotes the corresponding spatial fre¬ 
quency II 20 II . 

Furthermore, to meet the rank sufficiency requirement of 
Heg, it is desirable that the rank of is not reduced after 


*In the Rayleigh fading channel, all AoDs/AoAs of the paths (non-LOS) 
are uniformly distributed among [0, 27r) and the number of paths approaches 
to infinity. 
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it being multiplied by W^. For this purpose, we require the 
columns of to be pairwise orthogonal so that the rank 
of Wf Hfc is lower bounded by Mms > Ns (the rank 
of the high-dimensional is assumed to be larger than 
Mms)^ which means the equivalent channel Hgq is potentially 
capable of supporting the transmission of KMms > KNs 
streams. Considering the form of we discretize the 

w into Nms levels over [0, 2tt) and construct Nms bases, 
given by D = {d(0),d(^), • • • ,)} as the 


(m) . 


max 

Wfc 


E 

m—l 


(m)sH 


rn 


fclll 


s.t. wi G D, m = 1, • • • , Mms- 


R = 


K 

E 


l0g2 


P 


Ns 


a^KNs 


(M 


^WfWfeMfe, 


-1 


(14) 




H 


k = 


in 


1 ) 


.hLi,h 


fc+l 5 


,H 


TiT 

k\ ■ 




T T ^ — T T^ 


V 


VI 


where vE consists of the first {K — 1)Mms right 

singular vectors of Hfc, and holds the rest Mms 

ones which are exactly the orthogonal bases of the null space 
of Hfc. Then we know 


H,V 


{Mms) 




^{Mms) 


0, i ^ k 


i = k 


(17) 


candidates from which the ' is choosen. As we can see, 
these bases in D exactly form an A^MS-dimensional DFT basis 
set, which simultaneously conforms to the rank sufficiency and 
large arrary gain requirements of Hgq. Therefore, we finally 
design the RF combiners by solving 
Mms 


Given the above results, block diagonalization of the base¬ 
band equivalent channel matrix to remove inter-user interfer¬ 
ence is written as 




eq 


^(Mms) 


(13) 


Ml V 1 


0 


z^iMMs)' 
1 '' K 


K 


(18) 


To solve the problem ( fT3l l, we just need to sort all Nms 
||d(w)^Hfe Ill’s in the descending order and then choose the 
first Mms d(a;)’s as the columns of Wfc. Note that each MS 
only needs to solve problem (fTST l with the corresponding index 
k for once to obtain its RF combiner. In addition, the number 
of antennas Nms of MS usually is much smaller than Nbs 
due to the actual device size and computational capacity, which 
makes the exhaustive search on the DFT bases acceptable. 

Remark 3.1: Based on the selection of the DFT bases, the 
MSs can avoid a huge amount of computation overhead for 
obtaining all the phase shift elements. In addition, only Ns 
phase shift elements per MS is needed to be fed back to the 
BS, so that to the BS is able to re-construct all the Wfc’s and 
calculate the aggregate intermediate channel Hint for further 
processing. 

B. Baseband Block Diagonalization 

In this section, based on the obtained baseband equivalent 
channel Heq, given the found RF processing matrices Wfc and 
F, we perform the low-dimensional BD processing with the 
baseband precoder B and combiners Mfc’s to cancel the inter¬ 
user interference, which forces the interference terms HfcB^ = 
0 for i k in (|5]l. The spectral efficiency of the MU-MIMO 
system can be further simplified to 


Until now, all the MSs can perform interuser-interference-free 
multi-stream transmission through their own sub-channels (the 
non-zero block in Hs^i). Further precoding/combining will be 
performed to achieve each MS’s optimal spectral efficiency 
based on SVD, given by 




k ■ 


(19) 


z^(Mms) ■ 


With the above rank sufficiency requirement, HfcVfc 
Mms-^Y-Mms full-rank sub-channel matrix which enables 
Mms > Ns data streams transmission for the k-th MS. 
Therefore, the optimal precoder and combiner on the k- 
th effective sub-channel HfcV[,^^^^ should be and 

where and are the first Ns columns 

of the Vfc and Ufc respectively. Finally, the overall baseband 
precoder is given by 


B = 


■^{Mms) z^{Mms) 

^1 r ■ ■ K 




(Ns) 


0 


V 


0 

{Ns) 

K 


■^(Mms)z^{Ns) 


Vi' 


) ... V 


z^{MMs)-y{Ns) 


K 


K 


J KMms x KNs 


( 20 ) 

And the baseband combiner for the fc-th MS is given by Mfc = 


U 


{Ns) , _ 


k = 1,2,--- ,K. 


The spectral efficiency achieved by the Hy-BD scheme 
finally becomes 


To obtain the baseband precoder B = [Bi, B 2 , • • • , B^], 
where Bfc incorporates the precoding vectors for the data 
streams of the fc-th MS, we first define Hfc as 


K 

^ = E 

/c=l 


lATg -b 


^Ns 


PAfc(MfWfWfcMfc)-i(sf^^)2 


a^KNs 




(15) 


The Bfc is supposed to lie in the null space of Hfc. Denote the 
rank of Hfc as r-fc < {K — 1)Mms- Then the singular value 
decomposition (SVD) of Hfc is given by 

H 


a^KNs 

( 21 ) 

where A = diag{Ai, A 2 , • • • ,Nk} is a KNs x KNs 
diagonal matrix that performs water-hlling power allocation, 
and Sfc^^^ represents the first Ns x Ns block partition of Sfc. 
As we choose DFT bases (or any other orthogonal bases) to 
construct Wfc’s, the simplification step (i) of (ISTT i holds due 

{Ns):HrT{Ns) 


(16) to MfWfWfcMfc = = ^Ns- 
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On the other hand, since the RF and baseband processing 
do not require any information of the propagation paths of 
the channels Hfc’s, namely, each term in the summation of 
0, the Hy-BD scheme can be performed on any kinds of 
massive MU-MIMO channels as long as the channel matrices 
are provided. 


traditional water-filling power allocation, which takes the 
weights of all MSs into account, termed as proportional water¬ 
filling. An insight from (|25]) can be interpreted as variable 
water-levels: one MS with a greater weight Wn = Wk has a 
higher water-level —, where more power can be allocated, 
and vice verse. 


C. Proportional Water-Filling Power Allocation for Weighted 
Sum-Rate Maximization 

After employing the Hy-BD processing scheme, the optimal 
power allocation for transmitted data streams can be achieved 
by water-filling due to the sum-rate (sum-log) maximization 
form in (1211) . However, considering the real-world scenarios, 
where fairness among users would often be considered, the 
pure sum-rate maximization in dlTli is not enough to guarantee 
the performance for some high-priority MSs or MSs located 
farther away from the BS. Therefore, the weighted sum-rate 
maximization is a more suitable objective when allocating 
transmission power to achieve proportional fairness. That is, 

K 

max R=y Wk log 2 
k^l 

s.t. trace{A} = KNs, 

> 0, for n = !,■■■ ,KNs, 


'-Ns 




a^KNs 


where Wk is the positive weight for the achievable 
rate of the /c-th MS. Slightly abusing the notation, 
we write the n-th diagonal element of A and 
^^diag{(sj^^))2),(sf^))2,... as A„ 

and 7 „ respectively. Then (l22l i can be rewritten as a convex 
optimization problem 


min 

{An} 


KNs 

- ^ Wnln(l -l- 7 nA„) 

n—1 


KNs 

s.t. ^ An = KNs, 

n—1 


(23) 


A„ > 0, for n = 1, • • • , KNs, 

where W(^k-i)N,+i = Wk,k = !,■■■ ,K and i = I,-- - ,Ns. 
Similar to Example 5.2 in 11211 . we introduce Lagrange 
multipliers {mi,-- - ,mxNs} S for the inequality 

constraints A„ > 0 and a multiplier u € R for the equality 
constraints = KNs, and the KKT conditions are 


KNs 

^ ^ An = KNs, An ^ 0, Kin ^ 0, Kln\n — 0, 

n=l (24) 


Wnln 


— Kin V = 0, for n = 1, - - - , KNs. 


1 “b 'Jn^n 

Then we can directly obtain 


that A 


(” - ffSfc) 


results in 


V — 


nKln — 

= 0, which 


An = tin niax{-r-,0} = max{^^-,0}, (25) 

V Wnln V In 

where v is determined by Yln^i = Yln^i inaxj ^ — 
::^,0} = KNs. This solution is a revised version of the 


D. Performance Analysis in ULA Single-Path Channels 

Due to the discretization of receive vectors in analog 
combiners and the baseband block diagonalization, analyzing 
the sum spectral performance of the hybrid BD scheme is 
indeed non-trivial. Nevertheless, it is tractable to present the 
performance analysis of a special case with ULA single-path 
channels and large numbers of transmit and receive antennas 
(Nbs, Nms oo). Note that, in the mmWave channels, both 
the BS and MSs need to employ large antenna arrays to harvest 
adequate receive power from the signals passing through 
a few propagation paths IflSlI . To conduct the performance 
analysis, we impose the following assumption that each MS 
only schedules one data stream through the only one RL chain, 
which is Ns = Mms = while the BS is equipped with 
Mbs = K RL chains. Herein, the single-path channel for the 
fc-th MS in © can be rewritten as 

Hfc = (26) 


where ak is the result of the large scale path fading ^/^ mul¬ 
tiplied the complex gain of this unique path, while 
and agg{(j)^) are the corresponding receive and transmit array 
response vectors. Besides, the analog combiner has only one 
column, denoted as Wj, = w^. With a large number of receive 
antennas Nms^ the candidates of DLT bases in problem ( fT3b 
will have an infinite resolution. Under this circumstance, we 
have 


Mms 

max ^ = max llwfHfell? 

Wfc ^' Wfc 

m—l 

= max 11 V NbsNmsQ^ [wf a^g \\l 

Wfc 

= max a^s(0'=)] - ||a|s(0'=)^||i}2 

Wfc 

= max {Nbs^ NMsa^i^k ^ms{^^)]\\iY ■ 

Wfc 

(27) 

Therefore, the analog combiner for the fe-th MS should be 
approximately k, selected from the DLT bases 

of infinite resolution when Nms oo. 

Lurthermore, the entries in the baseband equivalent channel 
Heq can be determined through applying EGT As we define 
an operator g(-) that imposes the element amplitudes of the 
input vector as unit, the (fc, j)-th entry of Heq is given by 


■txik.j) 

eq 



[wf Hfe - g((wf H,)^)] 


-\/ NsS^MSCl'^^Bsi^'^)^' 

NBsNmsa^a-Bsi^)) 

sjNBsNMSOi'a%g{4>^)^ ■ a?gg{<jp). 


( 28 ) 






















7 


With the form of A'^es-element ULA antenna setting, it is 
intuitive that a.gg((j)^) = 1 , while 

j^nd{sm <p^ —sin (p^) 

(29) 

^ ^ _ g^d{sin4>^—sin 4 >'°)Nbs 

NbS 1 _ g^(i(sin07-sin0'“) 

Without the loss of generality, we regard that sin cj)^ — sin (f)^ ^ 
0 as long as k ^ j. Then we safely draw a conclusion that 


Nbs 


Nbs — ^ 

E 


lim 


H 


(fej) 

eq 


Nbs—>-oo H 


(fe.fc) 


= lim a' 
A^bs—700 


k 

BS 


= lim 


1 


d{sin <f>^—sin <f>’^)NBS 


Nbs^ocNbS 1 _ e^d(sin0^-sin0'') 


where k ^ j. Therefore, the baseband equivalent channel 
can be approximated as a diagonal matrix after analog pre¬ 
coding and combining, given by Kf.q = V^ss^Msdiag • 
{a^,a^,--- ,a^}, due to the fact that the off-diagonal en¬ 
tries are infinitesimal compared with the diagonal entries as 
NbSjNms oo. There is no need to do block diagonaliza- 
tion except the water-filling power allocation at the baseband 
to achieve the optimal sum spectral efficiency as 


R Ki log 2 



a^K ) ’ 


(31) 


where A is a diagonal matrix that performs water-filling power 
allocation. Eq. dSTT i is an approximate sum spectral efficiency 
under the settings of ULA single-path channels and large 
number of transmit and receive antennas, and we will present 
this analytical result in the simulations. 


IV. Simulation Results 

In this section, we evaluate the spectral efficiency achieved 
by the Hy-BD scheme as well as its performance robustness 
in the massive MU-MIMO channels. 


A. Spectral Efficiency Evaluation 

In the simulations of this section, we illustrate the spectral 
efficiency achieved by the Hy-BD scheme in the massive MU- 
MIMO systems by comparing it with the traditional high¬ 
dimensional baseband BD scheme in large i.i.d Rayleigh fad¬ 
ing and mmWave multiuser channels and also with the previ¬ 
ously proposed spatially sparse precoding/combining scheme 
Ea in mmWave channels. The range of the signal-to-noise 
ratio SNR = -^ is from -40 dB to 0 dB in all processing 
solutions. And the large-scale fading path loss factor Pk,k = 
I,-- - ,K, of all MSs are uniformly distributed in [0.5,1.5]. 
All MSs have equal unit weights in the simulations. 

Fig. |2]illustrates the sum spectral efficiency achieved by the 
traditional BD scheme and our proposed Hy-BD scheme in 
the large i.i.d. Rayleigh fading channel. The BS with Mbs = 
16 RF chains is employed to schedule K = 8 MSs, each 


of which processes Ns = 2 data streams with Mms = 2 
RF chains. Furthermore, the BS and MSs are equipped with 
256 (16) and 64 (4) antennas respectively. In both 256 x 16 
and 64 X 4 antenna settings, the sum spectral efficiency of 
the Hy-BD scheme consistently approaches the performance 
achieved by the traditional BD scheme, however, with lower 
implementation and computational complexity. Notably, the 
results of the 64 x 4 antenna setting indicate that the Hy-BD 
scheme is still effective in a small scale antenna system. 

In the mmWave MU-MIMO channels, the traditional full- 
complexity BD and Hy-BD schemes perform in a similar 
fashion as in the Rayleigh fading channels. Based on the 
limited number of paths scattered in the mmWave channels, 
the spatially sparse precoding/combining scheme in Ea can 
be extended to the hybrid processing in MU-MIMO sys¬ 
tems through decomposing the solution to the traditional BD 
scheme (the precoder M 5 and the MMSE combiners in 1^) 
via orthogonal matching pursuit where the BS and MSs choose 
the array response vectors of the corresponding AoDs and 
AoAs as the basis vectors respectively. Fig. |3] shows the 
sum spectral efficiency of the above processing schemes with 
ULA and UFA employed respectively!! We set the mmWave 
propagation channel with Nc = 8 and Np = 10. The range of 
the mean azimuth angles of AoDs at the BS 
120° while the MSs are assumed to be omni-directional due 
to the relatively smaller antenna array elements. The angle 
spreads ct^’s and cr^’s are all equal to 7.5° (the settings 
of azimuth angles are also applied to elevation angles in 
UFA setup). Moreover, the BS is set to have Nbs = 256 
antennas and Mbs = 16 RF chains, while K = 8 MSs, with 
Nms = 16 antennas and Mms = 2 RF chains, all dealing 
with Ns = 2 data streams. In this scenario, the proposed Hy- 
BD scheme even achieves slightly higher spectral efficiency 
than the traditional BD scheme, while the performances of 
the Hy-BD scheme and spatially sparse coding scheme are 
upgraded when the system applies UFA instead of ULA. Note 
that the traditional BD scheme is a sub-optimal solution for 
the processing of MU-MIMO systems, and it is possible that 
the Hy-BD outperforms the traditional BD in some situations. 
As for the spatially sparse precoding/combining scheme, it 
lags behind the traditional BD and Hy-BD schemes because 
the columns of the traditional BD precoding and combining 
matrices do not directly come from the linear combination of 
the array response vectors of AoDs/AoAs, the basic forming 
units of the RF matrices in the spatially sparse coding scheme 
ESI. This is very different from the F2F scenario that the 
spatially sparse scheme is designed for, where the columns 
of the SVD based precoder and combiner can be effectively 
approached by the linear combinations of the array response 
vectors according to the observation 3) in EH- Even though 
the number of RF chains is enlarged to Mms = 4 and 
Mbs = 32, the performance of the spatially sparse precod¬ 
ing/combining scheme is still inferior to the full-complexity 
BD and Hy-BD schemes. 

Considering the critical situation that only one data stream 

^Under the UFA setup, it is necessary to introduce extra elevation angle 
for each propagation paths. In the simulations, we use the same settings for 
both elevation and azimuth angles. 
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Fig. 2: Sum spectral efficiency achieved by different process¬ 
ing schemes in an 8-user MU-MIMO system in i.i.d. Rayleigh 
fading channels where Ns = 2,Mms = “2, Mbs = 16- 


is supported by each MS with only RF chain employed (total 
8 MSs), we are able to further compare our results with the 
limited feedback hybrid precoding scheme proposed in ifTsl 
in Fig. |4] It shows that the proposed Hy-BD scheme still 
outperforms other baselines. Although the limited feedback 
hybrid precoding scheme is capable of tracking the strongest 
path in the mmWave channels, it fails to harvest the large array 
gain when the mmWave channel for each MS is not extremely 
sparse since only an RF chain pair is available for each MS to 
track one propagation path in the RF domain (we generate 80 
paths for each MS’s mmWave channel in the simulations of 
Fig. IDi. However, with the Hy-BD scheme, the EGT enabled 
by the RF precoder can directly aggregate the channel gains 
so that the spectral efficiency performance can be guaranteed. 
Furthermore, the approximate sum spectral efficiency of hybrid 
BD scheme in ULA single-path channels, analyzed in Section 
Illl-DI is illustrated in Fig. |5] where Mms = Ns = 1 and 
Mbs = K = 2. It shows that the hybrid BD performs closely 
to its analytical approximate version, with about 1 bps/Hz 
degradation, which is caused by limited numbers of transmit 
and receive antennas as well as the DFT restriction. In this 
circumstance, the hybrid BD scheme and limited feedback 
method obtain similar performance since they are both capable 
of tracking the channel’s unique path. 
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Fig. 3: Sum spectral efficiency achieved by different process¬ 
ing schemes in an 256 x 16 8-user MU-MIMO system in 
mmWave channels where Ns = 2, Mms = 2(4), Mbs = 
16(32). 


Fig. 4: Sum spectral efficiency achieved by different process¬ 
ing schemes in an 256 x 16 8-user MU-MIMO system in 
mmWave channels where Ns = 1, Mms = ^, Mbs = &■ 


B. Robustness Evaluation 


In addition to simply demonstrating the spectral efficiency 
of the Hy-BD scheme under different SNRs, we further exam¬ 
ine its performance robustness by changing the multiplexing 
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Fig. 5: Sum spectral efficiency achieved by different process¬ 
ing schemes in ULA single-path channels where Mms = 

Ns = 1,Mbs = K = 2. 


settings (e.g., the number of data streams supported by each 
user and the number of users) and introducing the channel 
estimation error. 

For the practical implementation of an MU-MIMO system, 
the total number of supported data streams is a very important 
criterion to evaluate the system performance, which depends 
on the number of supported MSs K and the number of data 
stream supported by each MS Ns, namely, space-division 
multiple access and spatial multiplexing. In Figs. |31] the sum 
spectral efficiency achieved by the traditional BD scheme and 
the Hy-BD scheme is checked in a 256 x 16 8-user MU-MIMO 
system in i.i.d Rayleigh fading channels under different SNRs, 
where each MS only employs Mms = Ns RF chains (K*Ns 
RF chains at the BS) in the Hy-BD scheme. 

In Fig. |6] the number of data streams per MS is set as 
Ns = 1,2,4 and the SNR ranges from -40 dB to 0 dB. 
The gap between the sum spectral efficiency of the traditional 
BD scheme and the Hy-BD scheme remains minute compared 
to the absolute sum spectral efficiency. However, the Hy-BD 
scheme only needs the same number of RF chains as the 
supported data streams at both BS and MSs (up to Mbs = 32 
and Mms = 4), much smaller than that of the traditional BD 
scheme {Mbs = 256 and Mms = 16). Fig. |7] shows the 
sum spectral efficiency of both schemes when Ns increases 
from 1 to 16 and the SNR is set as —10, —5 and 0 dB, which 
indicates that it is suitable to employ the Hy-BD scheme when 
the total number of data streams in the MU-MIMO system is 
not too large, so that the Hy-BD can reach the peak spectral 
efficiency. As we can see, the sum spectral efficiency achieved 
by the traditional BD scheme will be continuously augmented 
in such a 256 x 16 8-user MU-MIMO system when the number 
of transmitted data streams increases, since more equivalent 
parallel channels (characterized by the diagonal elements in 
S in 0) can be utilized to transmit the data streams and 
the effect of inter-user interference is not dominant in this 
case. However, in the Hy-BD scheme, the spectral efficiency 
performance is somewhat compromised once a large quantity 


of data streams are transmitted. This is because the pursuit 
of the large array gain slightly introduces the inter-stream 
interference in the RF domain, which will degrade the system 
spectral efficiency after the baseband BD processing. On the 
other hand, with an increasing SNR, the suitable numbers of 
the supported data streams Ns, corresponding to the peak 
spectral efficiency, for the traditional BD scheme and Hy-BD 
scheme are also enhanced. For instance, when SNR = 0 dB, the 
traditional BD scheme supports up to IT * Ns = 8 * 16 = 128 
data streams which is the maximum number of the supported 
data streams by a 256 x 16 8-user MU-MIMO system with 
full RF chains. However, the Hy-BD scheme can support about 
K + Ns = 8 * 8 = 64 data streams with only 64 and 8 RF 
chains at the BS and MS respectively. 

With the same system configuration as that of Figs. |6] and 
|7] and the number of data streams per MS set as Ns = 4, 
the number of MSs K increases from 1 to 16 in Fig. |8] 
In this case, the traditional BD scheme with full RF chain 
configuration reaches a peak spectral efficiency at a certain 
K. This is because when K grows beyond an optimal value, 
inter-user interference substantially becomes more severe and 
the sum spectral efficiency is gradually degraded. As for the 
Hy-BD scheme with the limited RF chain configuration, the 
sum spectral efficiency keeps improving when K increases 
from 1 to 16 (the maximum number of supported data stream 
is still up to K*Ns = 4*16 = 64). By comparing the results of 
Figs. |7] and [8] the Hy-BD scheme can be safely recommended 
for implementation in systems with a large number of MSs, 
however, each of which deals with a small number of data 
streams, since it is less vulnerable to the inter-user interference 
than the traditional BD scheme in this case. As for the case in 
Fig. |7] where there are fewer MSs and more data streams per 
MS, the traditional BD scheme achieves superior performance 
at the cost of high complexity because it can better process 
the inter-stream interference than the Hy-BD scheme. 



Fig. 6: Sum spectral efficiency achieved by different pro¬ 
cessing schemes in an 8-user MU-MIMO system in i.i.d. 
Rayleigh fading channels where Ns = Mms = 1,2,4 and 
Mbs = 8Mms- 

Furthermore, we examine the sum spectral efficiency of both 
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Fig. 7: Sum spectral efficiency achieved by different process¬ 
ing schemes in a 256 x 16 8-user MU-MIMO system in i.i.d. 
Rayleigh fading channels where Ns increases from 1 to 16 
and SNR = -10,-5,0 dB. 



Fig. 8; Sum spectral efficiency achieved by different process¬ 
ing schemes in a 256 x 16 MU-MIMO system in i.i.d. Rayleigh 
fading channels where K increases from 1 to 16, SNR = 
-10,-5,0 dB and Ns = 4. 


schemes with an increasing Ns or K under different SNRs 
(-10,-5 and 0 dB) in the mm Wave MU-MIMO channels 
whose propagation characteristics are given in Fig. [2s settings. 
The BS and MS configurations are the same as those of Figs. 
[7] and [8] Here, Fig. |9] illustrates the sum spectral efficiency 
of both schemes when Ns increases from 1 to 16 with 
K = 8, while Fig. (TOj gives the result for the number of 
MSs K increasing from 1 to 16 with Ns = 4. As can be 
seen, the general trends of the sum spectral efficiency of the 
traditional BD scheme and the Hy-BD scheme in mm Wave 
channels are consistent with those in Rayleigh fading channels, 
except that the Hy-BD scheme can perform slightly better 
in mmWave channels compared with the results in Rayleigh 
fading channels. It is probably due to the fact that the DFT 
bases selection (conforming to the forms of AoAs/AoDs array 
responses of the limited number of paths in mmwave channels) 


in the Hy-BD scheme essentially captures the dominant paths 

rvf thp mmWnvp rhnnnplQ 



Fig. 9: Sum spectral efficiency achieved by different pro¬ 
cessing schemes in a 256 x 16 8-user MU-MIMO system in 
mm Wave channels where Ns increases from 1 to 16 and SNR 
= -10,-5,0 dB. 



Fig. 10: Sum spectral efficiency achieved by different process¬ 
ing schemes in a 256 x 16 MU-MIMO system in mmWave 
channels where K increases from 1 to 16, SNR = —10, —5, 0 
dB and Ns = 4. 


V. Conclusion 

In this paper, a low-complexity hybrid block diagonaliza- 
tion processing scheme has been proposed for the downlink 
communication of a massive multiuser MIMO system with 
the limited number of RF chains. We harvest the large array 
gain through the phase-only RF precoding and combining 
and then the BD technique is performed at the equivalent 
baseband channel. It has been demonstrated that the Hy- 
BD scheme, with a lower implementation and computational 
complexity, achieves a capacity performance approaching that 
of the traditional high-dimensional baseband BD processing. 




























Such a low-complexity, low cost Hy-BD scheme can be a 
promising option for the practical implementation of a massive 
MU-MIMO system. 
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